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Abstract 

The problem of community detection is relevant in many scientific 
disciplines, from social science to statistical physics. Given the im- 
pact of community detection in many areas, such as psychology and 
social sciences, we have addressed the issue of modifying existing well 
performing algorithms by incorporating elements of the domain appli- 
cation fields, i.e. domain-inspired. We have focused on a psychology 
and social network - inspired approach which may be useful for further 
strengthening the link between social network studies and mathemat- 
ics of community detection. Here we introduce a community-detection 
algorithm derived from the van Dongen's Markov Cluster algorithm 
(MCL) method |5] by considering networks' nodes as agents capable 
to take decisions. In this framework we have introduced a memory 
factor to mimic a typical human behavior such as the oblivion ef- 
fect. The method is based on information diffusion and it includes a 
non-linear processing phase. We test our method on two classical com- 
munity benchmark and on computer generated networks with known 
community structure. Our approach has three important features: 
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the capacity of detecting overlapping communities, the capabihty of 
identifying communities from an individual point of view and the fine 
tuning the community detectability with respect to prior knowledge 
of the data. Finally we discuss how to use a Shannon entropy measure 
for parameter estimation in complex networks. 

1 Introduction 

Detecting communities is a task of great importance in many disciplines, 
namely sociology, biology and computer science [H EJ [TSl [201 [22] , where sys- 
tems are often represented as graphs. Community detection is also linked 
to clustering of data: many clustering methods establish links among rep- 
resentative points that are nearer than a given threshold, and then proceed 
in identifying communities on the resulting graphs [2[ |3] . Given a graph, in 
a broad sense, a community is a group of vertices "more linked" than be- 
tween the group and the rest of the graph. This is clearly a poor definition, 
and indeed, on a connected graph, there is not a clear distinction between 
a community and the rest of the graph. In general, there is a continuum of 
nested communities whose boundaries are somewhat arbitrary: the structure 
of communities can be seen as a hierarchical dendogram [I5j. 

In general, community detection algorithms rely on global quantities like 
betweenness, centrality, etc. [HI US] and most algorithms require the graph 
to be completely known. This constraint is problematic for networks like 
the World Wide Web, which for all practical purposes is too large and too 
dynamic to ever be fully known. 

Moreover in complex networks, and in particular in social networks, it is 
very difficult to give a clear definition of community: it is caused by the fact 
that nodes often results in overlapping communities because they belong to 
more than one cluster or module or community. The problem of overlapping 
communities was discussed in [T7] and recently a solution to it were presented 
in [TI]|. For instance people usually belong to different communities at the 
same time, depending on their families, friends, colleagues, etc.: so each 
people, making a subjective community detection algorithm, has its own 
vision of communities in his social environment. 

In social networks, the definition of a community could be linked to the 
human capability of information processing, particularly the poor evaluation 
of probabilities. When faced with insufficient data or insufficient time for a 
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rational processing, we humans have developed algorithms, denoted heuris- 
tics, that allows to take decisions in these situations. The modern approach 
to the study of cognitive heuristics defines them as those strategies that pre- 
vent one from finding out or discovering correct answers to problems that are 
assumed to be in the domain of probability theory. The ratio of a cognitive 
algorithm for community detection is based on the fact that humans' net- 
works are the results of the individual stategies of single subjects; on the 
other hand they are presumably shaped and evolved by the social structures 
in which they live [U [10] . 

The paper is organized as follows: we start by describing a new algorithm 
for detecting communities in complex networks in Section [2j Considering 
psychological notions as mentioned above, we adopted local algorithm where 
an individual is simply modeled as a memory and a set of connections to other 
individuals. The "learning" (nonlinear) phase is modeled after competition 
in chemical/ecological world, where resources fighting each other in order not 
to fall into oblivion. In Section |3] we describe the first algorithm in which 
information about neighboring nodes is propagated and elaborated locally, 
but the connections do not change. Here we want to emphasize not only the 
good efficiency of the algorithm in detecting community but also its capability 
to discover overlapping nodes and a sort of subjective vision of hierarchical 
levels of the network. Next, in Section |4] we give an interpretation of Shannon 
entropy of information as quality function for estimating models parameters. 
Finally we discuss our results and we propose future steps in Conclusions. 



2 Competion process 

We consider individuals, labeled from 1 to A^. Let us denote by A the 
adjacency matrix, Aij = 1 (0) indicates the presence (absence) of a link from 
site j to site i; all links have the same weight (Figure [T]). Each individual i 
is characterized by a state vector Si, representing his knowledge of the outer 
world. We interpret S* as a probability distribution, assuming that 5f ^ IS 
the probability that individual i belongs to the community k. Thus, 5*1^'' is 
normalized on the index k. We shall denote by 5* = S{t) the state of the 

(k) 

all network at time t, with Sik = . We shall initialize the system by 
setting Sij{0) = 6ij, where 6 is the Kroneker delta, 6ij = 1 ii i = j and zero 
otherwise. In other words, at time each node only knows about itself. 
As mentioned the competition phase is modeled thinking to a chemi- 
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Figure 1: An example of adjacency matrix A. It is a three-level matrix com- 
posed by 4 blocks of 2 sub- communities of 8 nodes each. The link probability 
inside a sub-community is 0.98, in first level blocks is 0.3 and among blocks 
is 0.03. White points indicate the presence of a link between the node i and 
the node j, = 1. 

cal/ecological analogy. Our algorithms are based on the concept of diffusion 
and competitive interaction in network structure introduced by Nicosia et 
al. [16]. 

If two populations x and y are in competition for a given resource, their 
total abundance is limited [13]. After normalization, we can assume x+y = 1, 
i.e., X and y are the frequency of the two species, and y = 1 — x. The 
reproductive step is given by x' = f{x), which we assume to be represented 
by a power x' = x°'. For instance, a = 2 models the birth of individuals of 
a new generation after binary encounters of individuals belonging to the old 
generation, with noneverlapping generations (eggs laying) [1] . 

After normalization we obtain: 

^ ~ X" ~ X" + (1 -x) 

Introducing z = (1/x) — 1 {0 < z < oo), we get the map 

z{t + l) = z''{t), (2) 

whose fixed points (for a > 1) are and oo (stable attractors) and 1 (unsta- 
ble), which separates the basins of the two attractors. Thus, the initial value 
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of X, Xq, determines the asymptotic value, for < x < 1/2, x{t — )■ oo) = 0, 
and for 1/2 < x < 1, x(t oo) = 1. 

By extending to a larger number of components for a probability distri- 
bution Pi, the competition dynamics becomes 

pa 

3 

and the iteration of this mapping, for a > 1, leads to a Kroneker delta, 
corresponding to the larger component. However, the alternation between 
information and competition can generate interesting behaviors. 



3 Information Dynamics Algorithm 

The dynamics of the network is given by an alternation of communication 
and elaboration phases. Communication is implemented as a simple diffusion 
process, with memory m. The memory parameter m allows us to introduce 
some limitations in human cognition such as the mechanism of oblivion and 
the timing effects: the most recent information has more relevance than the 
previous one [71 El] . 

We assume that each individual spends the same amount of time in com- 
munication, so that people with more connections dedicate less time to each 
of them. Since the amount of available time is limited, we normalize the 
adjacency matrix on the columns (i.e., we assign at each link the inverse of 
the output degree of the incoming node), forming a Markov matrix M 

Note that in many mathematical texts the indices are inverted, so that the 
Markov matrices are normalized on the rows. We prefer the "physics" no- 
tation so that matrix multiplication with a probability distribution P takes 
the usual form P' = MP. Then in the communication phase, the state of 
the system evolves as 

S{t+^) =mS{t) + {l-m)MS{t). (5) 

As described in the Eq. |3| the competition phase is modeled thinking to 
a competive interaction between the nodes in the network [T6] . 
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In this way the dynamic of the model is given by a sequence S{t) — )■ 

s{t + i) ^ s{t + iy. 

m)^MijSjkit), 

(6) 



We assume that individuals' memory is large enough so that they can 
keep track of all information about all other individuals. In a real case, one 
should limit this memory and apply an input/output filtering. Individuals 
do not change their connectivity. For testing purposes we use three networks 
and analyzing and discussing our model peculiarities. The three case studies, 
of growing or different complexity, are: a simple artificial network used to 
show the typical output of our algorithm, the Zachary karate club network 
[23j| and the bottlenose dolphins network [12j . 

3.1 Simple artificial network 




Figure 2: Simple artificial network composed by of 9 nodes and 13 links 
divided in 2 communities. It is possible to identify two different communties: 
the first one composed by nodes 1-2-3-4-5 and the second one by 6-7-8-9. 

In this first case study the algorithm face with a very simple task and 
converges to an optimal solution in few iterations and for a wide range of 
model's parameters m and a. Analyzing state matrix S(t), it is possible to 
identify two different communities marked by nodes 5 and 9. 



Sik \ t + 



Sik{t + 1) 



mSikit) + (1 

E.s-it + l 
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S(T). 
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In Figure [3[b) it is possible to identify two different communties high- 
lighted by upper values in the graph. The first community is composed by 
node 1-2-3-4-5 and the second one by 6-7-8-9. Our algorithm is capable also 
to detect overlapping nodes (4 and 6) as "middle" values between blue lines. 
In this way each node knows exactly its role in the network. 




(a) 




(b) 



Figure 3: (a) On the x-axis of both figures there are is number of nodes. 
On the y-axis: the cumulative distribution P^^^ (dashed black line, Pj^^ = 

^jS'jj, multiplied by five) and P^^^ (blue line, Pj"^^ = Aj, connectiv- 
ity). The information propagation algorithm identifies communities by leaves 
(nodes 5 and 9 with lower connectivity) with m = 0.3 and a = 1.4. (b) The 
value of state vectors, at the final asymptotic time, of node 5 (dashed black 
line) and node 9 (black line). We can observe upper values indentifying com- 
munities: the first one composed by nodes 1-2-3-4-5 and the second one by 
nodes 6-7-8-9. The algorithm is capable also to detect the communication 
nodes 4 and 6 between the blue lines. In this way we can indentify the overlap 
between the communities and also define a sort of objective vision of nodes. 
It is clear that the upper nodes know very well which is their community as 
well as nodes 4 and 6 that know that they are in a middle state between two 
communities. 
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3.2 Zachary "Karate Club" 



The second test case is a typical network literature example: the network 
proposed by Zachary in the 1977 , and known as "karate club" [23j. Although 
this network (Figure |4|^a)) is rather small, our algorithm shows interesting 
results. With m = 0.25 and a = 1.4 the algorithm has detected three 
communities in few steps as described in the Figure llFb). 



Figure 4: (a) Zachary's karate club network, (b) On the x-axis of the figure 
there is the number of nodes. On the y-axis: the connectivity (blu line) P^^^ 
and the cumulative distribution (dashed black line) P^^^ are reported at final 
asymptotic time with m = 0.2 e a = 1.4. The P'^^^ reveals three underlying 
substructures labeled by nodes 13,17 and 23. 




(a) 



(b) 
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(a) (b) 

Figure 5: (a) Values of state vector 13: the community is defined by black 
nodes in Figure |5](b) corresponding to upper values in the Level 1. As well as 
in the simple artificial network network the algorithm has detected not only 
the rigth community but also the overlapping node corresponding to big gray 
diamond in Figure [sj^b), in the Level 2. The Level 3 and the Level 4 (respec- 
tively gray and white nodes in Figure [sj^b)) correspond to the different level 
of knowledge the others. In the Level 3 it is possible to find "direct" contacts 
while in the fourth level it is possible to detect "friends of my friends". 




(a) (b) 

Figure 6: (a) Values of state vector 17: the community is defined by black 
nodes in Figure [6|b) corresponding to upper values in the Level 1. For the 
description see the caption of the Figure [5j 
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(a) (b) 

Figure 7: (Values of state vector 17: the community is defined by black nodes 
in Figure [7](b) corresponding to upper values in the Level 1. In this case we 
haven't found overlapping but is also possible to detect a sort of scale of 
friendship inside the network labeled by the 4 levels in Figure [Tj^a) and by 
black-to-white color scale in Figure [Wb) . 




Figure 8: Communities detected by our algorithm labeled by dark gray 
(shown in Figure [s]), light gray (shown in Figure [6]) and white nodes (shown 
in Figure [T]). On the other hand three circles rapresnt the overlap between 
communities because of the role of node 3 and 9 as explained in Figure [5] and 
Figure |6| 



3.3 Bottlenose dolphin Network 

The third case study concerns a community network of dolphins. The net- 
work we study was constructed from observations of a community of 62 



10 



bottlenose dolphins over a period of seven years from 1994 to 2001 |12j . 




(a) (b) 



Figure 9: (a) Bottlenose dolphin network. This network has a size of 62 nodes 
and from direct observation it is known that it has two communities, (b) On 
the X-axis of the figure there is the number of nodes. On the y-axis: the 
connectivity (blu line) P'-^^ and the cumulative distribution (dashed black 
line) P'-'^^ are reported at final asymptotic time with m = 0.5 e a = 1.03. 
The P^^^ reveals two underlying substructures labeled by nodes 59 and 61 




Figure 10: (a) Values of state vector number 59: the four levels indicate 
respectively nodes of community (white nodes), overlapping nodes (white 
diamond nodes), closer nodes of the other community (gray diamonds) and 
finally nodes of the other community (gray nodes). (b) Communities detected 
by our algorithm. 
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(a) (b) 

Figure 11: (a) Values of state vector number 61: the four levels indicate 
respectively nodes of community (white nodes), overlapping nodes (white 
diamond nodes), closer nodes of the other community (gray diamonds) and 
finally nodes of the other community (gray nodes). (b) Communities detected 
by our algorithm. 

4 Entropy of information 

In order to present the temporal results in a compact way, we computed the 
entropy ii^ of a configuration S, using the cumulative distribution over the 
non-normalized index, 



Eis) ^ J- pWJ _ ^(1 _ ,og(i - 

i i 

The entropy E is maximal for the flat distribution, when each node knows 
only itself, and minimal (zero) what all the network has only one label (or 
has become just one star for the rewiring algorithm). If the population is 
evenly distributed among n clusters, the entropy is E = log(n). Let us to 
study the artificial complex network illustrated in Figure [1} This network is 
composed by three levels with different probability to have a link in a region. 

As we observed our algorithm is able to observe all levels of a hierarchical 



network. In Figure 12 a) it is possible to identify the final level of the artificial 
network. The value of Entropy E{t) can help us understand the structure of 
the network at priori. In fact, different levels of a hierarchical structure are 



identified by the plateau as we can observe in Figure 12 b). This result is em 



phasized by the entropy's first derivative where we can observe three distinct 
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Figure 12: Referring to the artificial hierarchical network illustrated in Fig- 
ure [l] (a) the asymptotic configuration observed with our algorithm using 
m = 0.27 and a = 1.25. (b) The Entropy of Information (E) vs Time: we 
can observe three different plateaux. The final configuration, E = 0, corre- 
sponds to the monocluster shown in (a); (c) plot of the first derivative of 
the Entropy which show the three platen with three different peaks, (d) We 
observe that the final community is identified by the higher connectivity. 



5 Conclusion 

In this paper we have described an algorithm to identify the communities 
structures in a network from a local point of view. 

The method is based on pure information propagation where the nonlinear 
part of these method, responsible for the actual elaboration of information. 
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Figure 13: The asympotic configuration of tlie entropy as function of the 
parameters m and a. We can clearly observe many different surfaces in this 
3D graph: the surfaces correspond to different asymptotic configurations 
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Figure 14: (a) Asymptotic configuration of the matrix S using m = 0.7 and 
a = 1.4 (this configuration corresponds to the large green surface in Fig- 
ure 13). We observe that the asymptotic result corresponds to the middle 



layer (four communities) of the hierarchical structure of the network; (b) 
We have identified all the different levels in a hierarchical complex network 
changing the parameters m and a. 
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is inspired by a chemical/ecological competition model [TB] . 

There is not a unique definition of a community, so an exploratory al- 
gorithm, like the one that humans have presumably developed during their 
evolution, should present different clustering for different values of the pa- 
rameters, or for different iterations. 

In this implementation we adopted a frequency-based approach and an 
unbounded memory at the level of nodes. Unbounded memory means that 
the node's state vector Si has not been limited and it could potentially reach 
a size equal to the network size A^. Despite this because of the explanation 
and normalization phases are sufficient to avoid this problem. Nevertheless 
it will be very important to limit the computational resources of the node ex- 
plicitly, as suggested by Simon in 1955 pTO] , so increasing both the ecological 
plausibility of the model and the insights which drive the algorithm design. 

The results that we have obtained are promising. The method under in- 
vestigation is not competitive with respect to others (see the review [8]), but 
it provides a natural "scanning" of the various clustering levels. Moreover, 
our method can be naturally applied to weighted graphs. We have demon- 
strated, through the definition of Entropy of Information, our algorithm is 
efficient to discover all cluster levels for general networks. 

We believe that the local algorithm procedure will not only allow to us 
to study much larger networks but also to mimic single human behavior 
in social network trough specific and simple heuristics decision rules. The 
model parameters m and a play a crucial role for the detection of commu- 
nities. These results suggests how cognitive heuristics could be designed as 
those mechanism which allow humans to optimize those parameters in order 
to maximize the gathered information from the environment. Following this 
assumption the future works will investigate what kind of computational pro- 
cedures could be used to mimic this human behavior. We plan to investigate 
the consequences of bounded memory and computational resources of nodes, 
in particular in a dynamic environment. 
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